MacSyFinder: A Program to Mine Genomes for Molecular Systems with an Application to CRISPR-Cas Systems
نویسندگان
چکیده
MOTIVATION Biologists often wish to use their knowledge on a few experimental models of a given molecular system to identify homologs in genomic data. We developed a generic tool for this purpose. RESULTS Macromolecular System Finder (MacSyFinder) provides a flexible framework to model the properties of molecular systems (cellular machinery or pathway) including their components, evolutionary associations with other systems and genetic architecture. Modelled features also include functional analogs, and the multiple uses of a same component by different systems. Models are used to search for molecular systems in complete genomes or in unstructured data like metagenomes. The components of the systems are searched by sequence similarity using Hidden Markov model (HMM) protein profiles. The assignment of hits to a given system is decided based on compliance with the content and organization of the system model. A graphical interface, MacSyView, facilitates the analysis of the results by showing overviews of component content and genomic context. To exemplify the use of MacSyFinder we built models to detect and class CRISPR-Cas systems following a previously established classification. We show that MacSyFinder allows to easily define an accurate "Cas-finder" using publicly available protein profiles. AVAILABILITY AND IMPLEMENTATION MacSyFinder is a standalone application implemented in Python. It requires Python 2.7, Hmmer and makeblastdb (version 2.2.28 or higher). It is freely available with its source code under a GPLv3 license at https://github.com/gem-pasteur/macsyfinder. It is compatible with all platforms supporting Python and Hmmer/makeblastdb. The "Cas-finder" (models and HMM profiles) is distributed as a compressed tarball archive as Supporting Information.
منابع مشابه
CRISPR-Cas: the effective immune systems in the prokaryotes
Approximately all sequenced archaeal and half of eubacterial genomes have some sort of adaptive immune system, which enables them to target and cleave invading foreign genetic elements by an RNAi-like pathway. CRISPR–Cas (clustered regularly interspaced short palindromic repeats–CRISPR-associated proteins) systems consist of the CRISPR loci with multiple copies of a short repeat sequence separa...
متن کاملThe application and mechanism of CRISPR-Cas systems in the treatment of infectious diseases
Infectious diseases remain a global threat with many people annually contracting the epidemic diseases. Improved understanding of the pathogenesis of bacteria, viruses, fungi, and parasites, along with rapid diagnosis and treatment of human infections are essential to improving infectious diseases outcomes worldwide. In many genomic loci in bacteria and archea, termed Clustered Regularly Inters...
متن کاملPhylogenetic Distribution of CRISPR-Cas Systems in Antibiotic-Resistant Pseudomonas aeruginosa
UNLABELLED Pseudomonas aeruginosa is an antibiotic-refractory pathogen with a large genome and extensive genotypic diversity. Historically, P. aeruginosa has been a major model system for understanding the molecular mechanisms underlying type I clustered regularly interspaced short palindromic repeat (CRISPR) and CRISPR-associated protein (CRISPR-Cas)-based bacterial immune system function. How...
متن کاملDifferential Distribution of Type II CRISPR-Cas Systems in Agricultural and Nonagricultural Campylobacter coli and Campylobacter jejuni Isolates Correlates with Lack of Shared Environments
CRISPR (clustered regularly interspaced palindromic repeats)-Cas (CRISPR-associated) systems are sequence-specific adaptive defenses against phages and plasmids which are widespread in prokaryotes. Here we have studied whether phylogenetic relatedness or sharing of environmental niches affects the distribution and dissemination of Type II CRISPR-Cas systems, first in 132 bacterial genomes from ...
متن کاملExpanding the catalog of cas genes with metagenomes
The CRISPR (clusters of regularly interspaced short palindromic repeats)-Cas adaptive immune system is an important defense system in bacteria, providing targeted defense against invasions of foreign nucleic acids. CRISPR-Cas systems consist of CRISPR loci and cas (CRISPR-associated) genes: sequence segments of invaders are incorporated into host genomes at CRISPR loci to generate specificity, ...
متن کامل